Setting up R to run ecomix

You can install the development version from github at ecomix, using the following R code. This will install the master branch which should be tested and working

library(devtools)
install_github("skiptoniam/ecomix@dev")

Species Archetype Models

Species Archetype Models (SAMs) describes how an homogeneous group of species varies with the environment. The environmental gradients are represented by covariates in the model. We refer to this model as species_mix in the ecomix package. We describe the data as \(y_{ij}\), for i=1…n sampling sites, j=1…S species and the unobserved k=1…K archetypes. The model, conditional on archetype group \(\textbf{z}_k\), is

\[ \begin{equation} h[\mathbb{E}(y_{ij}|\textbf{z}_k)] = \alpha_j + g_k(X_{i}^\top\beta_{k}) + \nu_i \tag{1}\label{eq:one} \end{equation} \]

where \(Pr(z_{k}) = \pi_k,\) and \(\sum^K_{k=1}{\pi_k=1}\). \(\vec{X}_i\) is a design matrix of covariates at site \(i\) used to describe the archetype groups and \(\nu_i\) is a natural log-offset. The functional form of \(g_k(.)\) can be specified to be any function commonly used within a Generalized Linear Model framework. Including linear, quadratic, spline and interaction terms. Additionally an offset term \(\nu_i\) can be included to account for sampling artefacts and are included into the model on a log-scale (e.g. log(area sampled)). We refer to the model as the species_mix in the ecomix package.

A simulation study using Species Archetype Models.

Here we present an example using the Species Archetype Models (SAMs) from our ecomix package. Here we present a simulation study to demonstrate the functionality of species_mix.

Biological and Environmental Data

We simulate a set of synthetic species to be fitted using the species_mix function. We generate these 100 species using a multivariate normal mixture model, with a Bernoulli realization of the expected means to generate presence and absence data for a random 200 sites derived from our study extent. We simulate a set of environment data which will be used as covariates in the Species Archetype Models.

library(ecomix)
library(raster)
## Loading required package: sp
library(sp)
library(reshape2)
library(ggplot2)
library(grid)
library(gridExtra)
library(RandomFields)
## Loading required package: RandomFieldsUtils
## 
## Attaching package: 'RandomFields'
## The following object is masked from 'package:RandomFieldsUtils':
## 
##     RFoptions
library(knitr)

set.seed(007)
lenny <- 250
xSeq <- seq( from=0, to=1, length=lenny)
ySeq <- seq( from=0, to=1, length=lenny)
X <- expand.grid( x=xSeq, y=ySeq)
#define four covariates within the study area
Mod1 <- RMgauss( var=1, scale=0.2) + RMnugget( var=0.01)
Mod2 <- RMgauss( var=1, scale=0.5) + RMnugget( var=0.01)
Mod3 <- RMgauss( var=1, scale=1)# + RMnugget( var=0.01)
Mod4 <- RMgauss( var=2, scale=0.1) + RMnugget( var=1)
simmy1 <- RFsimulate( Mod1, x=xSeq, y=ySeq)
## New output format of RFsimulate: S4 object of class 'RFsp';
## for a bare, but faster array format use 'RFoptions(spConform=FALSE)'.
simmy2 <- RFsimulate( Mod2, x=xSeq, y=ySeq)
simmy3 <- RFsimulate( Mod3, x=xSeq, y=ySeq)
simmy4 <- RFsimulate( Mod4, x=xSeq, y=ySeq)
X <- cbind( X, as.numeric( as.matrix( simmy1)), as.numeric( as.matrix( simmy2)),
            as.numeric( as.matrix( simmy3)), as.numeric( as.matrix( simmy4)))
X[,-(1:2)] <- apply( X[,-(1:2)], 2, scale)
colnames( X) <- c("x","y","covar1","covar2","covar3","covar4")
env <- rasterFromXYZ( X)
names(env) <- c("Temperature", "Oxygen", "Depth", "Productivity")
env.df <- as.data.frame(env,xy=TRUE)
env_dat<-rasterToPoints(env)
save(env, env.df, env_dat, file="data/sim_env.RData")

plot our simulated marine environment

Simulated environmental variables used in Species Archetype Modelling

Simulated environmental variables used in Species Archetype Modelling

Simulate Biological Data

We simulated a set of synthetic species to be fitted using the species_mix function. We generated the expected species intercepts \(\alpha_j\) from a beta distribution, and assign known group level covariates \(\beta_k\). \(\beta_k\) represents the archetype (group) response to each covariate in the model. We simulated species archetypal responses using the species_mix.simulate function. If no known parameters are provided random parameters will be generated for the formula and data provided. Here we provided parameters for the species intercepts (alphas) and the archetype mean responses (betas).

Species intercepts

First up lets simulate some intercepts - refereed to as alphas (\(\alpha_j\)) in the model - we do this from a beta distribution

set.seed(42)
nsp <- 100
betamean <- 0.3
betabeta <- 2
betaalpha <- betamean/(1-betamean) 
alpha <- rbeta( nsp, betaalpha, betabeta) #prevalence with mean of betaalpha/(betaalpha+betabeta)
alphas <- log( alpha / ( 1-alpha))  #put them on the right scale-- but note that this is conditional on all covars being zero

# curve( dbeta( x, betaalpha, betabeta), from=0.001, to=0.999) 
#the distribution that it is drawn from this density. Basically we want some thing that looks like "real world" community data (with lost of rarer things)
# plot(density(alphas), main="Alphas")
prev<-exp(alphas)/(1+ exp(alphas))
par(mfrow=c(1,2))
curve( dbeta( x, betaalpha, betabeta), from=0.001, to=0.999) #the distribution that it is drawn 
hist(prev, main="Prevalence")

Archetype parameters

Secondly, lets create some betas (\(\beta_k\)), these are the slope parameter values for each of the archetypes. Now, I’ve just made up some value here, but you can play around with these in your spare time and see how they change the model.

We are using quadratic terms here, but you could play around with or other functional forms for the archetype responses. You should be able to fit most terms available in GLM, including unpenalised splines (but this still needs a little testing).

betas <- as.matrix(data.frame(Temperature=c(0.75,0,1.5),
                              Temperature2=c(-0.75,0,0),
                              Oxygen=c(0,0.5,0),
                              Oxygen2=c(0,-0.5,0),
                              Depth= c(-1.5,0.5,1.5),
                              Depth2= c(0,-0.5,0),
                              Productivity=c(1,0,-2.5),
                              Productivity2=c(-1,0,0),
                              Time=rnorm(3)))
rownames(betas) <- paste0("Archetype.",1:3)
print(betas)
##             Temperature Temperature2 Oxygen Oxygen2 Depth Depth2 Productivity
## Archetype.1        0.75        -0.75    0.0     0.0  -1.5    0.0          1.0
## Archetype.2        0.00         0.00    0.5    -0.5   0.5   -0.5          0.0
## Archetype.3        1.50         0.00    0.0     0.0   1.5    0.0         -2.5
##             Productivity2       Time
## Archetype.1            -1  0.2946543
## Archetype.2             0 -0.2792594
## Archetype.3             0 -1.3362367

Now based on the environmental data simulated above, we can generate a realisation of the data from a set of random sites.

sim_dat <- data.frame(intercept=1,env_dat[,3:ncol(env_dat)])
sites<-sample(1:nrow(sim_dat), 200, replace=FALSE)
env_200<-sim_dat[sites,]
env_200$Time <- sample(c(0,1),size = 200, replace = TRUE)

Formula structure in ecomix

Now one thing to be aware of in species_mix and regional_mix is how we use the left hand side of the formula to setup which species are in the model. This might seem a bit clunky to start, but once you start modelling it’s really useful as you can change the species response just by changing the formula, as you would do with covariates on the right hand side of the formula.

No another confusing thing, is we declare the species component (left hand side of the formula) in the archetype_formula or the rcp_formula.

The species formula is for the species-specific things in the model. For most standard species archetype models this is left just as an intercept. We could extend this later with partial SAMs, where we want to put a set of covariates on the species.

Setting up the formula will look something like this:

sam_form1 <- stats::as.formula(paste0('cbind(',paste(paste0('spp',1:10),
collapse = ','),")~Temperature+Oxygen+Depth+Productivity"))
sam_form2 <- stats::as.formula(paste0('cbind(',paste(paste0('spp',sample(100,20)),
collapse = ','),")~Temperature+Oxygen+Depth+Productivity"))
sam_form <- stats::as.formula(paste0('cbind(',paste(paste0('spp',1:100),
collapse = ','),")~Temperature+Oxygen+Depth+Productivity"))
sp_form <- ~1
print(sam_form1)
## cbind(spp1, spp2, spp3, spp4, spp5, spp6, spp7, spp8, spp9, spp10) ~ 
##     Temperature + Oxygen + Depth + Productivity
print(sam_form2)
## cbind(spp35, spp3, spp73, spp64, spp45, spp91, spp29, spp95, 
##     spp42, spp17, spp20, spp97, spp49, spp76, spp30, spp51, spp74, 
##     spp33, spp7, spp25) ~ Temperature + Oxygen + Depth + Productivity
print(sam_form)
## cbind(spp1, spp2, spp3, spp4, spp5, spp6, spp7, spp8, spp9, spp10, 
##     spp11, spp12, spp13, spp14, spp15, spp16, spp17, spp18, spp19, 
##     spp20, spp21, spp22, spp23, spp24, spp25, spp26, spp27, spp28, 
##     spp29, spp30, spp31, spp32, spp33, spp34, spp35, spp36, spp37, 
##     spp38, spp39, spp40, spp41, spp42, spp43, spp44, spp45, spp46, 
##     spp47, spp48, spp49, spp50, spp51, spp52, spp53, spp54, spp55, 
##     spp56, spp57, spp58, spp59, spp60, spp61, spp62, spp63, spp64, 
##     spp65, spp66, spp67, spp68, spp69, spp70, spp71, spp72, spp73, 
##     spp74, spp75, spp76, spp77, spp78, spp79, spp80, spp81, spp82, 
##     spp83, spp84, spp85, spp86, spp87, spp88, spp89, spp90, spp91, 
##     spp92, spp93, spp94, spp95, spp96, spp97, spp98, spp99, spp100) ~ 
##     Temperature + Oxygen + Depth + Productivity
print(sp_form)
## ~1

Now let’s simulate the data! We have set up the covariates for our Bernoulli model, so we can now generate the data we will fit using the species_mix.

sam_form <- stats::as.formula(paste0('cbind(',paste(paste0('spp',1:100),
collapse = ','),")~poly(Temperature,degree=2,raw=TRUE)+poly(Oxygen,degree=2,raw=TRUE)+poly(Depth,degree=2,raw=TRUE)+poly(Productivity,degree=2,raw=TRUE)+Time"))
sp_form <- ~1
simulated_data200 <- species_mix.simulate(archetype_formula=sam_form,
                                       species_formula=sp_form,
                                       alpha = alphas,
                                       beta = betas,
                                       data = env_200,
                                       nArchetypes = 3,
                                       family = binomial())

Now we can have a look at the data structure for the simulated data

kable(simulated_data200[1:5,c(1:5,101:105)])
spp1 spp2 spp3 spp4 spp5 const poly.Temperature..degree…2..raw…TRUE.1 poly.Temperature..degree…2..raw…TRUE.2 poly.Oxygen..degree…2..raw…TRUE.1 poly.Oxygen..degree…2..raw…TRUE.2
6843 0 0 0 0 0 1 1.0349351 1.0710906 1.6109003 2.5949998
7110 0 0 0 0 0 1 1.4925039 2.2275680 1.2200102 1.4884250
54377 0 0 0 0 0 1 -0.2334823 0.0545140 -0.8027139 0.6443496
30808 0 0 0 1 0 1 -0.5465535 0.2987207 0.0562737 0.0031667
18359 0 0 0 0 0 1 0.7693061 0.5918319 1.8652395 3.4791185

Additionally all the simulated parameters and data structure information are stored as attribute’s in the simulated data object

names(attributes(simulated_data200))
##  [1] "names"     "class"     "row.names" "SAMs"      "pi"        "alpha"    
##  [7] "beta"      "mu"        "size"      "offset"

Model fitting and evaluation

Inspection of the simulated data shows we have generated occurrence records for 100 synthetic species, we assume that across our study domain there were 200 randomly surveyed sites. This means we can model the data using a Bernoulli error distribution. Because we have presence and absences data (occurrences of species at sites) we need to a use a Bernoulli probability model for the observation of species at sites. We can do this by selecting setting the family='Bernoulli' or family=Binomial() in the species_mix function.

We also need to consider a few other data properties before fitting the model:

  • Should we model all the 100 species?
  • How many of the 100 species should we include in the analysis?
  • What covariates and what covariate functional form are likely to be important for driving the distributions of species and archetypes?
  • How many groups should we choose to represent our 100 species?

First off we can look at the number of observations of each species across all the sites. We can see there are a large number of species with less than 10 occurrences. This is a common feature of multiple species data sets, where there are often long tails of rare species. In this example, we may chose to remove the rare species (< 10 occurrences across all sites). These rarer species could be potentially included in the model (within reason!), but would likely create noise and unexplained variance, making the model harder to fit and estimate.

For example, work by Hui et al. (2013), removed all species with less that five records across all sites. We will do this by removing all species with less than ten occurrences across out sites. Typically, we can include a greater number of rarer species in species mix models compared to single species SDM methods, which often remove species with occurrence counts less than 20, 30 or 50 observations (or more) (Hui et al. 2013).

Figure 2. The simulated occurrences for the 100 species. We will remove all species with less than 10 presences across all 200 sites.

Figure 2. The simulated occurrences for the 100 species. We will remove all species with less than 10 presences across all 200 sites.

Fitting a Species Archetype Model shares vary similar methodological steps as a standard regression or GLM in R. Once we have selected the response data, we consider which covariates will contribute to explaining the distributions of simulated species across the sites. From a SDM perspective, this might translate to how key environmental gradients shape the niche of the species observed in our geographical extent. The right side of the formula in the archetype_formula is meant to represent the functional form and the covariates which will describe the archetypes. To assign which species will contribute to the archetypes we need to generate the left hand side of the archetype_formula. This has the following specific form: cbind(spp_1,...,spp_i). This structure is used so species can be added or removed from the model based on the formula, rather that having to restructure the response data (which is commonly done in multiple species models). The species_formula will remain as the default ~1 setting, and assumes species-specific intercepts. This is the resulting formula for the species archetypes in the species_mix model.

Below is a small code block with a basic example on how to fit a species_mix model.

## load the ecomix package
library(ecomix)
 
## Select species with greater than ten occurrences across all sites.
spdata <- simulated_data200[,1:100] 
spdata <- spdata[,-which(colSums(simulated_data200[,1:100])<10)] 
samdat_10p <- cbind(spdata,env_200)
samdat_10p$Time <- as.factor(ifelse(samdat_10p$Time>0,"Night","Day"))

## Archetype formula
archetype_formula <- as.formula(paste0(paste0('cbind(',paste(colnames(samdat_10p)[grep("spp",colnames(samdat_10p))],collapse = ", "),") ~ poly(Temperature,degree=2,raw=TRUE)+poly(Oxygen,degree=2,raw=TRUE)+poly(Depth,degree=2,raw=TRUE)+poly(Productivity,degree=2,raw=TRUE)+Time")))

## Species formula
species_formula <- ~ 1

## Fit a single model
sam_fit <- species_mix(archetype_formula = archetype_formula, # Archetype formula
                       species_formula = species_formula,    # Species formula
                       data = samdat_10p,             # Data
                       nArchetypes = 3,               # Number of groups (mixtures) to fit
                       family = binomial(),           # Which family to use
                       control = list(quiet = FALSE)) # Print all the outputs?
## initial  value 4567.509648 
## iter  10 value 4563.323918
## iter  20 value 4561.007849
## iter  30 value 4560.496032
## iter  40 value 4560.277856
## iter  50 value 4560.267556
## iter  60 value 4560.264135
## iter  70 value 4560.259971
## final  value 4560.259360 
## converged

Multiple fits and group selection.

For most mixture models, the likelihood surface is pretty complicated so, we often want to either use multiple ECM refits to get good starting values (within a single model), or alternatively we could fit multiple models using stochastic starting values. Now we can do a multiple fitting to see what which number of groups is appropriate for the data.

One challenge when developing SAMs (or any finite mixture model) is selecting \(k\); the number of archetypes (groups) in the model. The number of archetypes is latent, so must be estimated from the data and the functional form of the covariates. In this example, we know that the optimal number of groups for these data is three, because we simulated the data with these characteristics. However, \(k\) is generally not known in real world applications. So one import part of the fitting process is finding \(k\), having said that it is also totally fine to define \(k\) for say a management objective. We can estimate the “best” number of groups based on the most parsimonious fit to the data. We can do this based on the model log-likelihood, and information criterion such as BIC. We provide a function species_mix.multifit which can assist in group selection if a vector of archetypes is provided. By setting the number of starts nstarts to \(>\) than one, we will also use a multiple fitting approach for estimating the log-likelihood. Below is an example of how one might do this.

nArchetypes <- 1:6
sam_multifit <- species_mix.multifit(archetype_formula = archetype_formula, # Archetype formula
                                     species_formula = species_formula,     # Species formula
                                     data = samdat_10p,                     # Data
                                     nArchetypes = nArchetypes,             # Number of groups (mixtures) to fit
                                     nstart = 5,                            # The number of fits per archetype.
                                     family = binomial(),                  # Which family to use
                                     control = list(quiet = TRUE,ecm.prefit=FALSE))
saveRDS(sam_multifit,"model_results/simulated_sams_5_multifit_bernoulli.rds")
sam_multifit <- readRDS("model_results/simulated_sams_5_multifit_bernoulli.rds")
plot(sam_multifit,type="BIC")
Group selection from the multiple fit function, we can see that three archetypes is the best fit to these data based on Bayesian Information Criterion (BIC).

Group selection from the multiple fit function, we can see that three archetypes is the best fit to these data based on Bayesian Information Criterion (BIC).

print(sam_multifit)
## A multiple fit bernoulli species_mix model object
## 
## You fitted models with  1, 2, 3, 4, 5, 6 Archetypes.
## 
## A total of 5 random starts were fitted per Archetype.
## 
## The best model based on BIC has 3 archetypes
##  species_mix model
## 
## Pi
## Archetype1 Archetype2 Archetype3 
##      0.315      0.452      0.233 
## 
## Coefficients
## $alpha
##        spp3        spp4        spp6        spp8       spp10       spp11 
## -2.47503738 -1.18704267 -0.85955031 -1.56675905 -0.87338827 -0.88703908 
##       spp12       spp13       spp14       spp16       spp19       spp20 
## -0.26921446  0.09134039 -0.44421340 -4.62797924 -2.21150649 -2.38078173 
##       spp21       spp23       spp24       spp25       spp27       spp28 
## -0.04404255  1.56561657 -3.38480464 -0.39394159 -3.23083727 -1.93213104 
##       spp30       spp31       spp33       spp34       spp35       spp38 
## -4.33016926 -1.90803152 -0.15199494 -2.78735645 -3.15549304 -1.58172989 
##       spp39       spp41       spp43       spp45       spp46       spp48 
##  0.39682611 -4.33035258  1.66213649 -2.18513249 -1.21510390 -0.58430470 
##       spp49       spp50       spp51       spp53       spp54       spp55 
## -0.90253096  0.35508742 -1.64347021  1.05674781 -1.83912404  0.17692378 
##       spp56       spp57       spp58       spp59       spp60       spp61 
## -1.10025483 -2.57272606 -2.47495213 -1.97970066 -1.78899834  0.34136188 
##       spp62       spp63       spp64       spp65       spp66       spp67 
## -0.34526091  0.67048728 -1.74740501 -2.13110836 -1.04322028  0.25294563 
##       spp68       spp70       spp72       spp74       spp75       spp76 
## -1.99462353 -1.33110628 -1.33116349 -0.06605292 -1.48998581 -1.04067039 
##       spp77       spp78       spp79       spp80       spp81       spp83 
##  0.24600086 -1.38953462  0.06720008  0.43419897  0.39605126 -1.70678703 
##       spp84       spp85       spp86       spp87       spp88       spp91 
## -0.58441212  0.20088742 -1.44829875 -0.77618270 -1.74754206 -1.42507016 
##       spp92       spp93       spp96       spp97       spp98       spp99 
## -4.52532055 -0.53810814 -0.98648320 -3.87756042 -1.70689544 -1.93212856 
##      spp100 
## -2.79081080 
## 
## $beta
##            poly(Temperature, degree = 2, raw = TRUE)1
## Archetype1                                0.738575618
## Archetype2                                1.625253949
## Archetype3                                0.009900929
##            poly(Temperature, degree = 2, raw = TRUE)2
## Archetype1                                -0.63595509
## Archetype2                                -0.06616253
## Archetype3                                 0.03162038
##            poly(Oxygen, degree = 2, raw = TRUE)1
## Archetype1                            0.01993459
## Archetype2                           -0.06547675
## Archetype3                            0.41433151
##            poly(Oxygen, degree = 2, raw = TRUE)2
## Archetype1                           -0.04251692
## Archetype2                           -0.02052845
## Archetype3                           -0.40603162
##            poly(Depth, degree = 2, raw = TRUE)1
## Archetype1                           -1.5920386
## Archetype2                            1.3336397
## Archetype3                            0.5779883
##            poly(Depth, degree = 2, raw = TRUE)2
## Archetype1                           0.04918131
## Archetype2                           0.09676774
## Archetype3                          -0.66341465
##            poly(Productivity, degree = 2, raw = TRUE)1
## Archetype1                                  0.92300377
## Archetype2                                 -2.54353421
## Archetype3                                  0.03345925
##            poly(Productivity, degree = 2, raw = TRUE)2  TimeNight
## Archetype1                                 -0.95361922  0.3583112
## Archetype2                                 -0.05923970 -1.4459038
## Archetype3                                 -0.02632381 -0.3440429
## 
## You can access more elements of this model using x$multiple_fits[[ 3 ]][[ 1 ]]

Now we can select the “best” model and assess the model outputs. We can do this based on BIC, in this case, I simply ask R which model BIC is the smallest and use that as our best model for now. Model selection is actually pretty difficult for mixture models, and we tend to do it in two steps. First, we select the best number of groups based on the full covariate model, and then we might do a backwards, forward, all model selection on the archetype covariates with a fix number of groups (based on the previous steps). I’m actually going to skip the covariate selection step, because we know that all the parameters in the model contribute (because it simulated data). But if you were fitting this model to real world data you might want to think about covariate selection here.

Residuals

We can also do model diagnostics using residuals in ecomix, by default we use random-quantile-residuals (RQR), sometimes refereed to as Dunn-Symth residuals (Dunn & Smyth 1996).

plot(sam_fit)

They look pretty good :)

We can also inspect troublesome species using the species call, which will plot the residual of the species by using the name of the species, the name should match that used in the model formula.

plot(sam_fit,species=sam_fit$names$spp[1])

Model Uncertainty

There are a few ways we can estimate uncertainty in the ecomix package. We can do it via a numerical estimation of the hessian (based on the analytically gradient, which is calculated in the c++ code). We can use that to calculate the variance-covariance matrix. Once we have the variance covariance matrix we tend to use a Monte Carlo simulation of the uncertain by drawing random realization from a multivariate normal using the model means and vcov as inputs. This approach is quick, but can be a bit unstable and can results can fail if the hessian cannot be inverted.

So a more reliable approach is to use a Bayesian Bootstrap (Rubin 1981). This approach is very similar to case resampling bootstrap, but it does not remove any site’s information from the model (which could potentially bias outcomes for small data sets).

Run bootstrap, I just ran 20 bootstraps here for convenience and speed, but you might want to run say 100 during model testing and maybe 1000 when you have the final model (clearly the size of the model will slow this down so be pragmatic about this choice)

sam_boot <- bootstrap(sam_fit,nboot = 20, quiet = TRUE)
## ....................

Partial responses

We can also plot the partial responses of covariates as shown in Dunstan et al. (2011) & Hui et al. (2013). The idea behind this approach is to understand the response of archetypes to a focal predictor. Firstly, we need to set up a data.frame which allows the focal.predictor to vary and averages over the other effects in the model. We demonstrate how to this in the next code chunk. We can include estimates of uncertainty in the partial response plots by including a bootstrap object into the plot function, otherwise the mean response is plotted without uncertainty in the effect plots.

par(mfrow=c(2,3))
eff.df <- effects_data(focal.predictors = c("Temperature","Oxygen","Productivity","Depth","Time"), sam_fit)
plot(x = eff.df, object = sam_fit, boot.object = sam_boot, ylim = c(0,1))
Partial responses of archetypes for each covariate in the model.

Partial responses of archetypes for each covariate in the model.

You can plot the responses on the “link” scale, which can sometimes help reconcile differences in counts/biomass.

par(mfrow=c(2,3))
plot(x = eff.df, object = sam_fit, boot.object = sam_boot, type="link")
Partial responses of Archetype on the link scale.

Partial responses of Archetype on the link scale.

We can plot all the species responses, which is not that interesting for this dataset, as we have simulated the data from a known set of covariates.

par(mfrow=c(2,3))
plot(x = eff.df, object = sam_fit, boot.object = sam_boot, ylim=c(0,1), response.var = "Species")
Partial responses of species for each covariate in the model.

Partial responses of species for each covariate in the model.

We can also sum the responses across the gradient, which might represent something like species richness, or it could be total abundance if say the data was counts.

par(mfrow=c(2,3))
plot(x = eff.df, object = sam_fit, boot.object = sam_boot, response.var = "SpeciesSum")
Partial responses of the sum across all species for each covariate in the model.

Partial responses of the sum across all species for each covariate in the model.

You can also export plotting single species and single archetypes by passing the response.var the name of the species or archetype in the model. Looking at object$names well help with this.

Species membership

We are often interested want to know which species belong to each archetype. We can do this by inspecting the posterior species membership to each archetype: \(\mathbb{E}(\textbf{z}_{jk}) = \tau_{jk}\)

We can look at them directly in the model using object$tau, or we can call the species_membership function, which can also be used for plotting.

tau <- species_membership(sam_fit)
plot(tau,margins = c(6,4),cexCol=1)
Species membership to each archetype.

Species membership to each archetype.

Model prediction

We can generate predictions for the each archetype in ecomix using predict() function. The default is used to generate a point mean estimate for each archetype. With the inclusion of a bootstrap object we can also provide confidence intervals or standard error for each prediction.

env.df$Time <- factor("Night",levels=c("Day","Night"))
sam3_pred <- predict(object=sam_fit, boot.object=sam_boot, newdata=env.df)
## Using supplied species_mix.bootstrap object (non-parametric bootstrap)
The predicted probability of each species archetype across the simulated environment and the standard error of the predictions generated based on the Bayesian bootstrap., cache=TRUE

The predicted probability of each species archetype across the simulated environment and the standard error of the predictions generated based on the Bayesian bootstrap., cache=TRUE

Or you can plot the lower and upper confidence intervals along side the means.
The predicted probability of each species archetype across the simulated environment with confidence intervals.

The predicted probability of each species archetype across the simulated environment with confidence intervals.

References

Dunn, P.K. & Smyth, G.K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5, 236.
Dunstan, P.K., Foster, S.D. & Darnell, R. (2011). Model based grouping of species across environmental gradients. Ecological Modelling, 222, 955–963.
Hui, F.K., Warton, D.I., Foster, S.D. & Dunstan, P.K. (2013). To mix or not to mix: Comparing the predictive performance of mixture models vs. Separate species distribution models. Ecology, 94, 1913–1919.
Rubin, D.B. (1981). The bayesian bootstrap. The annals of statistics, 130–134.